242 research outputs found
Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning
Off-policy learning is more unstable compared to on-policy learning in
reinforcement learning (RL). One reason for the instability of off-policy
learning is a discrepancy between the target () and behavior (b) policy
distributions. The discrepancy between and b distributions can be
alleviated by employing a smooth variant of the importance sampling (IS), such
as the relative importance sampling (RIS). RIS has parameter
which controls smoothness. To cope with instability, we present the first
relative importance sampling-off-policy actor-critic (RIS-Off-PAC) model-free
algorithms in RL. In our method, the network yields a target policy (the
actor), a value function (the critic) assessing the current policy ()
using samples drawn from behavior policy. We use action value generated from
the behavior policy in reward function to train our algorithm rather than from
the target policy. We also use deep neural networks to train both actor and
critic. We evaluated our algorithm on a number of Open AI Gym benchmark
problems and demonstrate better or comparable performance to several
state-of-the-art RL baselines
MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching
Text matching is the core problem in many natural language processing (NLP)
tasks, such as information retrieval, question answering, and conversation.
Recently, deep leaning technology has been widely adopted for text matching,
making neural text matching a new and active research domain. With a large
number of neural matching models emerging rapidly, it becomes more and more
difficult for researchers, especially those newcomers, to learn and understand
these new models. Moreover, it is usually difficult to try these models due to
the tedious data pre-processing, complicated parameter configuration, and
massive optimization tricks, not to mention the unavailability of public codes
sometimes. Finally, for researchers who want to develop new models, it is also
not an easy task to implement a neural text matching model from scratch, and to
compare with a bunch of existing models. In this paper, therefore, we present a
novel system, namely MatchZoo, to facilitate the learning, practicing and
designing of neural text matching models. The system consists of a powerful
matching library and a user-friendly and interactive studio, which can help
researchers: 1) to learn state-of-the-art neural text matching models
systematically, 2) to train, test and apply these models with simple
configurable steps; and 3) to develop their own models with rich APIs and
assistance
Parameter Estimation with the Ordered Regularization via an Alternating Direction Method of Multipliers
Regularization is a popular technique in machine learning for model
estimation and avoiding overfitting. Prior studies have found that modern
ordered regularization can be more effective in handling highly correlated,
high-dimensional data than traditional regularization. The reason stems from
the fact that the ordered regularization can reject irrelevant variables and
yield an accurate estimation of the parameters. How to scale up the ordered
regularization problems when facing the large-scale training data remains an
unanswered question. This paper explores the problem of parameter estimation
with the ordered -regularization via Alternating Direction Method of
Multipliers (ADMM), called ADMM-O. The advantages of ADMM-O
include (i) scaling up the ordered to a large-scale dataset, (ii)
predicting parameters correctly by excluding irrelevant variables
automatically, and (iii) having a fast convergence rate. Experiment results on
both synthetic data and real data indicate that ADMM-O can perform
better than or comparable to several state-of-the-art baselines
NSME: a framework for network worm modeling and simulation
Various worms have a devastating impact on Internet. Packet level network modeling and simulation has become an approach to find effective countermeasures against worm threat. However, current alternatives are not fit enough for this purpose. For instance, they mostly focus on the details of lower layers of the network so that the abstraction of application layer is very coarse.
In our work, we propose a formal description of network and worm models, and define network virtualization levels to differentiate the expression capability of current alternatives. We then implement a framework, called NSME, based on NS2 for dedicated worm modeling and simulation with more details of application layer. We also analyze and compare the consequential overheads. The additional real-time characteristics and a worm simulation model are further discussed.5th IFIP International Conference on Network Control & Engineering for QoS, Security and MobilityRed de Universidades con Carreras en Informática (RedUNCI
- …